perm filename 154EQ.TEX[1,RWF] blob
sn#752250 filedate 1984-04-22 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00002 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 \magnification\magstephalf
C00018 ENDMK
C⊗;
\magnification\magstephalf
\input macro.tex
\def\today{\ifcase\month\or
January\or February\or March\or April\or May\or June\or
July\or August\or September\or October\or November\or December\fi
\space\number\day, \number\year}
\baselineskip 14pt
\def\fnc#1{\mathop{\rm #1}\nolimits}
\rm
\line{\sevenrm langeq.tex[v1,phy] \today\hfill}
\def\zap{\mathrel{\vbox{\lineskip1pt\baselineskip1pt
\halign{\ctr{$##$}\cr
\scriptscriptstyle L\cr \sim\cr}}}}
\def\zzap{\mathrel{\vbox{\lineskip1pt\baselineskip1pt
\halign{\ctr{$##$}\cr
\scriptscriptstyle L\cr \approx\cr}}}}
\def\zapp{\mathrel{\vbox{\lineskip1pt\baselineskip1pt
\halign{\ctr{$##$}\cr
\scriptscriptstyle L↓2\cr \sim\cr}}}}
\def\zappp{\mathrel{\vbox{\lineskip1pt\baselineskip1pt
\halign{\ctr{$##$}\cr
\scriptscriptstyle L↓3\cr \sim\cr}}}}
\def\footnote#1#2{#1botinsert{\hrule width5pc \vskip3pt\baselineskip9pt
\hbox par size{\eightpoint#1#2}}}
\bigskip
\line{CS 154\hfill Spring 1983}
\line{Prof. R.W. Floyd\hfill}
\bigskip\noindent
If $L$ is a language over $\Sigma↑{\ast}$, it can be used to define two equivalence
relations. The first relation makes two strings equivalent if they can be
used interchangably anywhere; the second if they can be used interchangably
at the beginnings of strings. We call the first $\zzap$, and the
second $\zap$.
We define $\zzap$ (infix equivalence) by saying $x\zzap y$ if replacing an $x$ by a
$y$, or a $y$ by an $x$, in any string of $L$, results in a string of $L$; in
a formula
$$(x \zzap y)≡(∀u,v\in\Sigma↑{\ast})(uxv\in L≡uyv\in L).$$
Since replacing $x$ by itself does not change it, $x \zzap x$, and
$\zzap$ is reflexive. By the symmetry of $≡$ in the definition,
$x \zzap y≡y \zzap x$, and $\zzap$ is symmetric. If
$x \zzap y$ and $y \zzap z$, then $uxv\in L⊃uyv\in L⊃uzv\in L$, etc., so
$\zzap$ is transitive.
We define $\zap$ (prefix equivalence) by saying $x \zap y$ if replacing an $x$ by a
$y$ or a $y$ by an $x$
at the beginning of a string of $L$ results in a string of $L$;
in a formula, $$(x \zap y)≡(∀v)(xv\in L≡yv\in L).$$
As with $\zzap$, $\zap$ is reflexive, symmetric, and transitive; both
are equivalence relations; $\zzap$ is often called {\sl congruence}. They have
some closure properties under concatenation:
\medskip
\disleft 20pt:(1):If $x \zzap y$ and $u \zzap v$, then $xu \zzap yv$.
\smallskip
\disleft 20pt:(2):If $x \zap y$, and $u \zzap v$, then $xu \zap yv$.
\smallskip
\disleft 20pt:(3):If $x \zap y$, and $u\in\Sigma↑{\ast}$, then $xu \zap yu$.
(A special case of (2).)
\smallskip
\disleft 20pt:(4):If for all $u$, $ux \zap uy$, then $x \zzap y$.
\medskip\noindent
By substituting $\epsilon$ for $u$ in the definition of $\zzap$, we see that
$x \zzap y$ implies $x \zap y$; that is, $\zzap$ is
{\sl stronger\/} than $\zap$, or $\zap$ is {\sl weaker\/} than $\zzap$.
As with all equivalence relations, $\approx$ and $\sim$ (we omit the $L$ if no
ambiguity arises) imply a partition of $\Sigma↑{\ast}$ into equivalence classes.
If $C$ is an equivalence class, and $x$ and $y$ are two members of $C$, then
$x≡y$, so $x≡u$ implies $y≡u$, etc., and the members of $C$ can, for many
purposes, be used interchangably.
Let $\hbox{Cong}(x)=\{\,y\mid x\approx y\,\}$, and
$\hbox{Eq}(x)=\{\,y\mid x\sim y\,\}$. Let {\bf E} be the set of equivalence
classes
and~{\bf C} the set of congruence classes. We call the number (finite or
infinite) of equivalence classes the {\sl index\/} of an equivalence relation.
\vfill\eject
\noindent
{\sl Theorem\/}: A language $L$ is an FAL if and only if $\zzap$ has finite index.
\medskip\noindent
Proof:
\smallskip
(1)\xskip FAL implies finite index. Let $M$ be a
DFA$(Q,\Sigma ,\delta,q↓0,F)$ accepting
$L$. For any string~$x$ in~$\Sigma↑{\ast}$,
let $\delta ↓x$ be the function $Q→Q$ defined
by $\delta ↓x(q)=\delta(q,x)$; if $n=|Q|$, there are at most $n↑n$ such functions.
If $\delta ↓x$ and $\delta ↓y$ are the same function, then $x\approx y$; in fact,
$$uxv\in L≡\delta ↓v\bigl(\delta ↓x\bigl(\delta ↓u(q↓0)\bigr)\bigr)
\in F≡\delta ↓v\bigl(\delta ↓y\bigl(\delta ↓u(q↓0)\bigr)\bigr)
\in F≡uyv\in L\,.$$
If we could find more than $n↑n$ non-equivalent
strings, their transition functions would all be different, which is not possible
(pigeonhole principle).
(2)\xskip Finite index implies FAL. Let $L$ be a language,
$\zzap$ of finite index.
Define $M=\bigl(\hbox{\bf E},\Sigma,\delta,$
$\hbox{Eq}(\epsilon),\ F\bigr)$,
where $\delta$ is defined by $\delta\bigl(\hbox{Cong}(x),\ a\bigr)
=\hbox{Cong}(xa)$, and
$F=\{\,\hbox{Cong}(x)\mid x\in L\,\}$.
\smallskip
\noindent
For the student: confirm that $\delta$ is well-defined, and that $M$
accepts~$L$.
\medskip\noindent
{\sl Theorem\/}: A language is an FAL if and only if $\zap$ has finite index.
\disleft 20pt:(1):FAL implies finite index. Let $M$ be as usual.
Let $q(x)=\delta(q↓0,x)$.
If $n=|Q|$, there are at most $n$ different values of $q(x)$. If $q(x)=q(y)$,
then $x\sim y$; in fact
$$xv\in L≡\delta\bigl(q(x),v\bigr)\in F≡\delta\bigl(q(y),v\bigr)\in F≡yv\in L\,.$$
If we could find more than $n$ non-congruent strings, their values of $q(x)$
would all be different, which is not possible.
\disleft 20pt:(2):Finite index implies FAL. Left as an exercise.
\medskip\noindent
{\sl Theorem\/}: The minimum number of states required by a DFA for $L$ is the
index of $\zap$; left as an exercise.
\medskip\noindent
{\sl Example\/}: Let $L↓1=\{A↑iB↑i,i≥1\}$. Then $x \zzap y$ if $x=y$, and also if:
\smallskip
\disleft 20pt:(1):both contain $BA$
\smallskip\noindent
or
\smallskip
\disleft 20pt:(2):$x=A↑iB↑j,y=A↑kB↑l,i≥1,j≥1,k≥1,l≥1,$ and $i-j=k-l$.
\smallskip\noindent
Also, $x \zap y$ if $x=y$, and also if
\smallskip
\disleft 20pt:(1):each contains $BA$ or has more $B$'s than~$A$'s.
\smallskip\noindent
or
\smallskip
\disleft 20pt:(2):$x=A↑iB↑j,y=A↑kB↑l,i≥j≥1,k≥l≥1$, and $i-j=k-l$.
\medskip\noindent
{\sl Example\/}: Let $L↓2$ be the set of palindromes over $\{A,B\}$.
Then $x \zzap y$, and $x \zapp y$, only if $x=y$.
\smallskip\noindent
Languages $L↓1$ and $L↓2$ have infinite index, so they are not FAL's.
\medskip\noindent
{\sl Example\/}: Let $L↓3$ be the set of strings over $\{A,B\}$ containing $ABA$.
Then $x \zappp y$ if:
\smallskip
\disleft 20pt:(1):Both contain $ABA$ or
\smallskip
\disleft 20pt:(2):Neither contains $ABA$, and each ends in $A$, or
\smallskip
\disleft 20pt:(3):Neither contains $ABA$, and each ends in $AB$ or
\smallskip
\disleft 20pt:(4):Neither contains $ABA$, and each is $\epsilon$,
or $B$, or ends in $BB$.
\vfill\eject
\noindent
Thus there are four equivalence classes, corresponding to this minimal $FA$:
\figbox 1.5truein:
\noindent
Hard exercise: define $\zappp$.
Because $x\approx y$, $u\approx v$ implies $xu\approx yv$, we can define
concatenation on equivalence classes; $\hbox{Cong}(x)\oplus
\hbox{Cong}(y)=\hbox{Cong}(xy)$.
\medskip\noindent
{\sl Exercise\/}: show this uniquely defines $\oplus$. (We will omit
$\oplus$ when we
feel like it.) The equivalence classes of a FAL form a finite semigroup under
concatenation (i.e., $\oplus$ is associative), with identity element
$\hbox{Eq}(\epsilon)$.
Conversely, if $\Sigma$ is a set of elements of a finite semigroup, with
operation~$*$,
and $S$ is any subset of the semigroup, there is an FA that reads
$a↓1a↓2\cdots a↓n\epsilon\Sigma↑{\ast}$, accepting iff
$a↓1*a↓2\cdots *a↓nεS$.
The FA
need only keep track of the semigroup product of the characters it has read.
Summing up, we have a large set of equivalent definitions of FAL-ness:
\smallskip
\disleft 20pt:(1):$L$ is accepted by a DFA
\smallskip
\disleft 20pt:(2):$L$ is accepted by a NFA
\smallskip
\disleft 20pt:(3):$L$ is regular (defined by a regular expression)
\smallskip
\disleft 20pt:(4):$L$ has a prefix equivalence relation of finite index
\smallskip
\disleft 20pt:(5):$L$ has an infix equivalence relation of finite index
\smallskip
\disleft 20pt:(6):There is a homomorphism $h$ from $\Sigma↑{\ast}$ to a
finite semigroup $S$ where
$x\in L$ if and only if $h(x)$ belongs to a particular subset of $S$.
\medskip\noindent
{\sl Theorem\/}: $L$ is an FAL if and only if there is a number $n$ such that
every string of length $n$ is equivalent to some shorter string.
\medskip
Thus, for $L$ an FAL, we can give a finite list of replacement rules $(x,y)$,
$|x|>|y|$, such that to test whether $z\in L$, we make allowed replacements until no
more are possible, and then test for membership in a finite set.
\medskip\noindent
{\sl Example\/}: $L↓3$ as defined above. A sufficient set of replacement rules is:
\noindent
$(BABA,ABA)$
\noindent
$(ABAB,ABA)$
\noindent
$(BBB,BB)$
\noindent
$(AA,A)$
\noindent
$(ABBA,A)$
\vfill\eject
\noindent
{\sl Corollary\/}:
If for each $n≥0$ there is a string of length $n$ not infix equivalent for $L$ to
any shorter string, then $L$ is not a FAL.
\medskip\noindent
{\sl Corollary\/}:
If $L$ is not an FAL, then there is an infinite sequence of characters
$a↓1a↓2a↓3\cdots$, for which the finite prefixes $a↓1a↓2\cdots a↓n$ are all infix
inequivalent for L. Any program to recognize $L$ must go into a new state
at each character as it reads this sequence. Then for each $n$, there is some
input of length $n$ that makes it use at least $\lceil \lg (n)\rceil$
bits of storage.
\vfill\eject\end